Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated retries #1776

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open

Automated retries #1776

wants to merge 30 commits into from

Conversation

ekzyis
Copy link
Member

@ekzyis ekzyis commented Dec 30, 2024

Description

close #1492 based on #1785, #1787

All failed invoices are returned to every client periodically via a new query failedInvoices. To make sure an invoice is only retried by one client at a time, we implemented expiring locks: we first try to update a new timestamp column retryPendingSince on the invoice. Only one client at a time will be able to successfully update the invoice. This lock expires after a minute after which another retry for the same invoice can be attempted by a client.

To stop after three payment attempts (= two retries), a new integer column Invoice.paymentAttempt tracks at which payment attempt we are. This number is increased when we retry an invoice and pass newAttempt: true which we do when we retry these fetched failed invoices. When the number is increased, the payment will start from the beginning with all sender and receiver wallets available.

TODO:

  • retry returned failed invoices on client
  • count retries so we can stop returning them after we hit a limit
  • only show failed invoices that have exhausted retries in notifications
  • don't retry invoices that have been manually cancelled?

added userCancel column, see #1785

  • don't return failed invoices to clients that don't have send wallets

client only polls when it has send wallets

  • don't return intermediate failed invoices that will be retried due to sender or receiver fallbacks

added "cancelledAt" < now() - interval '${WALLET_RETRY_AFTER_MS} milliseconds' filter

  • ⚠️ make sure when this is deployed, this will not start retrying old failed invoices ⚠️

added WALLET_RETRY_BEFORE_MS used in this filter:

AND now()::timestamp <@ tsrange(
  "cancelledAt" + interval '${WALLET_RETRY_AFTER_MS} milliseconds',
  "cancelledAt" + interval '${WALLET_RETRY_BEFORE_MS} milliseconds'
)
  • use lock in retry mutation

new TODO since feedback:

  • also update hasNewNotes
  • fix no notification if no wallets are attached and QR payment fails since they are never retried

Additional Context

  • this will still retry invoices that have been created before deployment and failed if they aren't older than a day
  • if a client detaches all wallets after they tried a payment for the first time, it will never show up in notifications as failed since it wasn't retried often enough. I consider this to be an edge case we don't need to worry about.
  • see https://github.com/stackernews/stacker.news/pull/1776/files#r1907791409

Checklist

Are your changes backwards compatible? Please answer below:

yes

On a scale of 1-10 how well and how have you QA'd this change and any features it might affect? Please answer below:

7. Tested automated retries for posting, replies and zapping with this patch:

diff --git a/api/resolvers/wallet.js b/api/resolvers/wallet.js
index 8e6f3f0e..fb9edd2f 100644
--- a/api/resolvers/wallet.js
+++ b/api/resolvers/wallet.js
@@ -469,7 +469,6 @@ const resolvers = {
         SELECT * FROM "Invoice"
         WHERE "userId" = ${me.id}
         AND "actionState" = 'FAILED'
-        AND "userCancel" = false
         AND "cancelledAt" < now() - ${`${WALLET_RETRY_AFTER_MS} milliseconds`}::interval
         AND "cancelledAt" > now() - ${`${WALLET_RETRY_BEFORE_MS} milliseconds`}::interval
         AND "paymentAttempt" < ${WALLET_MAX_RETRIES}
diff --git a/lib/constants.js b/lib/constants.js
index fe46b71a..09b6981a 100644
--- a/lib/constants.js
+++ b/lib/constants.js
@@ -198,7 +198,7 @@ export const WALLET_CREATE_INVOICE_TIMEOUT_MS = 45_000
 // interval between which failed invoices are returned to a client for automated retries.
 // retry-after must be high enough such that intermediate failed invoices that will already
 // be retried by the client due to sender or receiver fallbacks are not returned to the client.
-export const WALLET_RETRY_AFTER_MS = 60_000 // 1 minute
+export const WALLET_RETRY_AFTER_MS = 5_000 // 1 minute
 export const WALLET_RETRY_BEFORE_MS = 86_400_000 // 24 hours
 // we want to attempt a payment three times so we retry two times
 export const WALLET_MAX_RETRIES = 2
@@ -277,7 +277,7 @@ function RetryHandler ({ children }) {
           console.error('retry poll failed:', err)
         }
         if (!stopped) queuePoll()
-      }, NORMAL_POLL_INTERVAL)
+      }, 1_000)
     }
     const stopPolling = () => {
       stopped = true

Simulate multiple clients with this patch:

diff --git a/wallets/index.js b/wallets/index.js
index 42fb55be..8a9f5f3f 100644
--- a/wallets/index.js
+++ b/wallets/index.js
@@ -255,9 +255,9 @@ function RetryHandler ({ children }) {
         return
       }

-      for (const inv of failedInvoices) {
+      for (const inv of [...failedInvoices, ...failedInvoices]) {
         try {
-          await retry(inv)
+          retry(inv).catch(err => console.error('retry failed:', err))
         } catch (err) {
           // some retries are expected to fail since only one client at a time is allowed to retry
           // these should show up as 'invoice not found' errors

Test p2p zaps with this patch that makes forwards fail and disables the fallback to CCs:

diff --git a/api/paidAction/index.js b/api/paidAction/index.js
index 8feabc30..031e4d7e 100644
--- a/api/paidAction/index.js
+++ b/api/paidAction/index.js
@@ -355,6 +355,7 @@ export async function retryPaidAction (actionType, args, incomingContext) {
       invoiceArgs = { bolt11, wrappedBolt11, wallet, maxFee }
     } catch (err) {
       console.log('failed to retry wrapped invoice, falling back to SN:', err)
+      throw err
     }
   }
diff --git a/worker/paidAction.js b/worker/paidAction.js
index 9b3ecb5a..b9172ebf 100644
--- a/worker/paidAction.js
+++ b/worker/paidAction.js
@@ -268,7 +268,7 @@ export async function paidActionForwarding ({ data: { invoiceId, ...args }, mode
     payViaPaymentRequest({
       lnd,
       request: bolt11,
-      max_fee_mtokens: String(maxFeeMsats),
+      max_fee_mtokens: String(0),
       pathfinding_timeout: LND_PATHFINDING_TIMEOUT_MS,
       confidence: LND_PATHFINDING_TIME_PREF_PPM,
       max_timeout_height: maxTimeoutHeight

For frontend changes: Tested on mobile, light and dark mode? Please answer below:

n/a

Did you introduce any new environment variables? If so, call them out explicitly here:

no

@ekzyis ekzyis added feature new product features that weren't there before wallets labels Dec 30, 2024
@ekzyis ekzyis marked this pull request as draft December 30, 2024 03:28
@ekzyis ekzyis force-pushed the automated-retries branch 4 times, most recently from eb65eb5 to 7c875d1 Compare January 1, 2025 18:02
@ekzyis ekzyis force-pushed the automated-retries branch from 7c875d1 to c0157fc Compare January 1, 2025 18:28
@ekzyis ekzyis force-pushed the automated-retries branch 10 times, most recently from 8757074 to 09cb758 Compare January 8, 2025 18:44
@ekzyis ekzyis marked this pull request as ready for review January 8, 2025 19:54
@ekzyis ekzyis requested a review from huumn January 8, 2025 19:54
api/resolvers/wallet.js Outdated Show resolved Hide resolved
wallets/index.js Outdated Show resolved Hide resolved
wallets/index.js Outdated Show resolved Hide resolved
wallets/index.js Show resolved Hide resolved
@ekzyis ekzyis force-pushed the automated-retries branch from f79f837 to ccbde0d Compare January 9, 2025 01:22
api/resolvers/wallet.js Outdated Show resolved Hide resolved
@ekzyis ekzyis force-pushed the automated-retries branch from ccbde0d to 15c799d Compare January 13, 2025 10:45
If a stacker has no send wallets, they would miss notifications about failed payments because they would never get retried.

This commit fixes this by making the notifications query aware if the stacker has send wallets. This way, it can tell if a notification will be retried or not.
As mentioned in a previous commit, I want to show anything that will not be attempted anymore in notifications.

Before, I wanted to hide manually cancelled invoices but to not change experience unnecessarily and to decrease mental overhead, I changed my mind.
@huumn
Copy link
Member

huumn commented Feb 12, 2025

How do you think about setHasSendWallets when there are multiple devices without device sync? I'm having trouble reasoning through it.

@ekzyis
Copy link
Member Author

ekzyis commented Feb 12, 2025

Notifications are now aware of if a stacker has send wallets via a new column users.sendWallets. I didn't simply pass this information as a GraphQL variable because notifications are rendered on the server and I didn't want to mess with that. Searching for getGetServerSideProps looked like we never pass GraphQL variables to it.1

With this information, I simplified the notification logic such that filter for failed payment notifications is exactly the inverse of the filter for when automated retries should be attempted. To achieve this, I also reverted my decision to not show failed payments with userCancel set. They will not get automatically retried and thus should show up in notifications.

This means that if someone enables a send wallet, some notifications might disappear if they will now get automatically retried and vice versa. I think this is not bad UX but actually good.

edit: just saw your reply

How do you think about setHasSendWallets when there are multiple devices without device sync? I'm having trouble reasoning through it.

oof, you are right, a device with no send wallets would overwrite that information 🙄

Footnotes

  1. Additionally, user.sendWallets is nice information for us 👀

@huumn
Copy link
Member

huumn commented Feb 12, 2025

Additionally, user.sendWallets is nice information for us.

I agree.

Anyway, to add some brainstorming. In a situation where we're worried about invoice "waste," don't want to retry without send wallets, and I'm thinking about the refactor, I think we should just send hasSendWallet with paid actions. We already do it with zaps.

We can probably write a graphql interface for paid action input types that requires hasSendWallet. retryPaidAction can also take it so that retries can be auto-retried if a wallet is added/found later.

@ekzyis ekzyis force-pushed the automated-retries branch 2 times, most recently from 0519549 to 804f58c Compare February 13, 2025 01:58
@ekzyis
Copy link
Member Author

ekzyis commented Feb 13, 2025

dea2e7a now always retries, even without send wallets, to make sure that failed payments eventually show up in notifications.

However, there was an issue if a stacker goes offline longer than we would try automated retries. In that case, the payment counter will not reach the maximum and we need to check cancelledAt to show the notification.

This works to show the notification in /notifications, but not for the indicator, since the indicator assumes that updatedAt will get updated but it doesn't since the invoice isn't updated in a query.

To fix this, I added a retryTimeout job in ddc115e. I will sleep over this to make sure updating the invoice like this doesn't mess with anything.

You can test this case with the following patch which does the following thing:

  • faster notification indicator
  • do not set userCancel for QR code cancel
  • faster retry timeout
  • don't poll for retries
diff --git a/components/use-has-new-notes.js b/components/use-has-new-notes.js
index dcc843f3..7d489056 100644
--- a/components/use-has-new-notes.js
+++ b/components/use-has-new-notes.js
@@ -11,7 +11,7 @@ export function HasNewNotesProvider ({ me, children }) {
     SSR
       ? {}
       : {
-          pollInterval: NORMAL_POLL_INTERVAL,
+          pollInterval: 1_000,
           nextFetchPolicy: 'cache-and-network',
           onCompleted: ({ hasNewNotes }) => {
             if (!hasNewNotes) {
diff --git a/components/use-qr-payment.js b/components/use-qr-payment.js
index dddc53e9..de766618 100644
--- a/components/use-qr-payment.js
+++ b/components/use-qr-payment.js
@@ -20,7 +20,7 @@ export default function useQrPayment () {
       let paid
       const cancelAndReject = async (onClose) => {
         if (!paid && cancelOnClose) {
-          const updatedInv = await invoice.cancel(inv, { userCancel: true })
+          const updatedInv = await invoice.cancel(inv, { userCancel: false })
           reject(new InvoiceCanceledError(updatedInv))
         }
         resolve(inv)
diff --git a/lib/constants.js b/lib/constants.js
index 9ec20245..e35dc2f5 100644
--- a/lib/constants.js
+++ b/lib/constants.js
@@ -201,8 +201,8 @@ export const WALLET_CREATE_INVOICE_TIMEOUT_MS = 45_000
 // interval between which failed invoices are returned to a client for automated retries.
 // retry-after must be high enough such that intermediate failed invoices that will already
 // be retried by the client due to sender or receiver fallbacks are not returned to the client.
-export const WALLET_RETRY_AFTER_MS = 60_000 // 1 minute
-export const WALLET_RETRY_BEFORE_MS = 86_400_000 // 24 hours
+export const WALLET_RETRY_AFTER_MS = 5_000 // 1 minute
+export const WALLET_RETRY_BEFORE_MS = 10_000 // 24 hours
 // we want to attempt a payment three times so we retry two times
 export const WALLET_MAX_RETRIES = 2
 // when a pending retry for an invoice should be considered expired and can be attempted again
diff --git a/wallets/index.js b/wallets/index.js
index d72227b5..94b8043d 100644
--- a/wallets/index.js
+++ b/wallets/index.js
@@ -254,6 +254,7 @@ function RetryHandler ({ children }) {
   useEffect(() => {
     // we always retry failed invoices, even if the user has no wallets on any client
     // to make sure that failed payments will always show up in notifications eventually
+    return
 
     const retryPoll = async () => {
       let failedInvoices

worker/wallet.js Outdated Show resolved Hide resolved
@ekzyis ekzyis requested a review from huumn February 13, 2025 18:55
@huumn
Copy link
Member

huumn commented Feb 13, 2025

This works to show the notification in /notifications, but not for the indicator, since the indicator assumes that updatedAt will get updated but it doesn't since the invoice isn't updated in a query.

To fix this, I added a retryTimeout job in ddc115e. I will sleep over this to make sure updating the invoice like this doesn't mess with anything.

I'm surprised the indicator query couldn't be changed instead.

@ekzyis
Copy link
Member Author

ekzyis commented Feb 13, 2025

Since possibly receiving a notification about a failed payment after 1 day is bad UX, I changed it to stop automated retries after one hour.

I'm surprised the indicator query couldn't be changed instead.

I think this is the first case where we want to show something in notifications after a timeout. This means it already existed in the database and wasn't updated between last poll and when we want to show it.

I considered to calculate updated_at on the fly using a CASE statement but I suspect that would mess with index usage.

And we can't create an index with an expression that depends on the current time afaik 🤔

So not sure how to change it. Using a job to update the timestamp makes everything work as expected.

However, we could add a new final FAILED state. That would also be an async job though.

Copy link
Member

@huumn huumn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I understand why you added the job. I think it can be fixed by making the notification and indicator queries more complicated - which sucks. We'd basically need a separate queries for userCancel or not and have to fuss with lastChecked and sortTime using WALLET_RETRY_BEFORE_MS if userCancel = false. Probably not much better than your job but it'll at least be in two places (indicator/notifications) rather than four (paidAction,indicator,notifications,worker)? I think without a refactor this is kind of where we are going to end up with this stuff.

I don't know if it's worth making those changes. Up to you. Just let me know about the ITEM_CREATE thing and I'll give it a final QA.

api/resolvers/wallet.js Outdated Show resolved Hide resolved
@@ -567,6 +562,31 @@ export default {
return true
}

const invoiceActionFailed2 = await models.invoice.findFirst({
Copy link
Member Author

@ekzyis ekzyis Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggled with writing a good comment since I already struggled with grasping this query, lol.

I think this didn't break anything but I didn't fully QA everything again.

Will do so after some sleep.

@ekzyis ekzyis requested a review from huumn February 14, 2025 02:08
@ekzyis
Copy link
Member Author

ekzyis commented Feb 14, 2025

Oh, I am still missing the fix for sortTime

Probably not much better than your job

btw, I think your suggestion is much better than my job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature new product features that weren't there before wallets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

automatic retries of failed paid actions
2 participants